
\ QfiSce I 



PRIORITY DOCUMENT 

SUBMITTED OR TRANSMITTED IN 
COMPLIANCE WITH 
RULE 17.1(a) OR (b) 



PCT/6B i004 / 0 U 4 2 7 i 





INVESTOR IN FEOFLB 

The Patent Office 
Concept House 
Cardiff Road 
Newport 
South Wales 
NPIO 8QQ 



REC'D 0 5 NOV im 



WlPO 



PCT 



I the undersigned, being an officer duly authorised in accordance with Section 74(1) and (4) 
of the Deregulation & Contracting Out Act 1994, to sign and issue certificates onbehalf of the 
Comptroller-General, hereby certify that annexed hereto is a true copy of the documents as 
originally filed in connection with the patent application identified therein. 



In accordance with the Patents (Companies Re-registration) Rules 1982, if a company named 
in this certificate and any accompanying documents has re-registered under the Companies Act 
1980 with the same name as that with which it was registered immediately before re- 
registration save for the substitution as, or inclusion as, the last part of the name of the words 
"public limited company" or thek equivalents in Welsh, references to the name of the company 
in this certificate and any accompanying documents shall be treated as references to the name 
with which it is so re-registered. 



In accordance with the rules, the words "public limited company" may be replaced by p.l.c. , 
pic, P.L.C. or PLC. 

istration under the Companies Act does not constitute a new legal entity but merely 
the company to certain additional company law rules. 






Signed UXA^^I^JLJI^^'^'*'^^ » 

Dated 22 October 2004 



BEST 



COPY 



For Official use only 



Your reference Plug-In Chains (UK) 




080CT03 E8427«-l Dlfl092- 
P0i/77QO 0.00-0323W0.8 



T 

0323440.8 



{. 7 OCT 2003 



Hie Request for grant of a 
Patent Patent 

Office 



Form 1/77 



Patente Act 1977 



1 Title of Invenflon 



Mark-up language framework with validation 
components 



2. Applicant's details 

prj First or only applicant . 

If applying as a corporate body: Corporate Name 

Symbian Limited 



2a 



2b 



Country 
GB 



If applying 
Surname 



as an Individual or partnership 



Forenames 



2c 



Address 



Sentinel House 
16 Haicourt Street 
London 



UK Postcode WIH IDS 
Country gB 
ADP Number 




1 1 Second applicant (if any) 
2d Corporate Name 

Country 




2e Surname 
Forenames 


- 


2f Address 

UK Postcode 

Country 
ADP Number 




3 Addrisss for service 

Agent's Name Origin Limited 

Agent's Address 52 l\/lusweil Hill Road 

London 

Agent's postcode N10 3JR 
Agent's ADP C03274 

Number nZlO^^'^loaSL 









4 Reference Number 

Plug-In Chains (UK) 



5 Claiming an earlier application date 
An earlier filing date is claimed: 
Yes □ NO H 

Number of earlier 
application or patent number 

Filing date 



15 (4) (Divisional) 8(3) 12(6) 



□ 



□ □ 



37(4) 
□ 



6 Dedaration of priori^ 

Country of filing Priority Application Number 



Filing Date 



7 Inventorship 

The applicant(s) are the sole Inventors/joint inventors 
Yes ^ No [X\ 



8 Checl<list 

Continuation sheets 
Claims i ^ Description 39-^ 

Abstract 1 Drawings 6 -j^ ^ 



9 Request 



Priority Documents -¥eS|^^ 

T,anslagonsofPriorf.yDocume,«s ^ 

Patents Form 7/77 Ve^^ 

Patents Form 9/77 (vZ^ 

Patents Form 10/77 Xes^) 



We request the grant of a patent on the basis 
ofthisapplicdtion . 

Signed: O^Cj;:jJ ^^^^'J-OM^^'^ 
(Origin Umited) 



DUPLICATE 



MARK-UP LANGUAGE FRAMEWORK WITH VALIDATION 
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DESCRIPTION OF THE PRIOR ART 

5 . Mark-up language is a set of codes in a text file that enable a computing device to format 
the text for correct display or printing. A cUent Ci.e. any process that requests a service 
from another process) in a software system creates mark-up language using a •generator'. 
It reads and interprets mark-up language using a 'parser*. 

In the prior art, parsers and generators have been specific to certain kinds of mark-up 
10 languages. For example, a client could use an XML (extensible mark-up language) parser 
to interpret and handle XML files; it could use a separate WBXML (WAP binary XML) 
parser to interpret and handle WBML files. In each case, the client would talk Mnctly to 
each parser. 

When the client needs to generate mark-up language format files, there could be an XML 
15 generator and a separate WBXML generator. Again, the dient would talk directly to each 
generator. 

In addition, each parser or generator would typically also operate with a dedicated 
component designed to check and perhaps alter its output. For example, a parser could 
send its output to a pre-filtering or a vaUdator component that checks its output. This 
20 pre-filter and/or validator would, as noted above, be dedicated and used solely by the 
parser. Hence, if a prior art system has an XML parser and also a WBXML parser, then 
it would also have a dedicated XML pre-filter and/or validator and a dedicated WBXML 
pre-filter and/or validator. 

In prior art systems, clients have had to be hard-coded to handle and talk diiecdy with 
25 these specific kinds of parsers and generators; in practice, this has meant that dients are 
either extremely complex (if they need to handle several different mark up language 
formats) or else they are restricted to a single mark-up language format. Further, a 
specific parser and validator are hard-coded to work solely with each other. 



SUMMARY OF THE PRESENT INVENTION 



The present invention is a portable computing device programmed with a mark-up 
language parser or generator that can access components to validate, pre-filter or alter 
data, in which the components are plug-in components that operate using a chain of 
responsibility. 

Because of the plug-in design of the components, the system is inherendy flexible and 
extensible compared with prior art systems in which a component (for validating, pre- 
filtering or altering data from a parser or generator) would be tied exclusively to a given 
parser. Hence, if a mark up language of a document is extended, or a new one created, it 
is possible to write an updated new validation/pre- filter/altering plug-in that may be 
needed to work with the extended or new language. These new kinds of validation/pre- 
filter/altering plug-ins can be provided for loading onto a device even after that device 
has been shipped to an end-user. The *chain of responsibilit}^' design pattern, whilst 
known in object oriented programming, has not previsouly been used in the present 
context. 

The plug-in components may all present a common, generic API to the parser and 
generator. Hence, the same plug-in can be used with different types of parsers and 
generators (e.g. a XML parser, a WBXML parser, a RTF parser etc.). The plug-ins also 
present a common, generic API to a client component using the parser or generator. 
Hence, the same plug-ins can be used by different clients. 

For example a DTD validator plug-in could be written that validates the mark-up of a 
document and can report errors to the client. Or for a web browser an auto correction 
plug-in filter could be written that tries to correct errors found in the mark-up language, 
such as a missing end element tag, or a incorrectiy placed element tag. The auto 
correction plug-in will, if it can, fix the error transparentiy to the client. This enables a 
web browser to still display a document rather then just displaying an error reporting tiiat 
there was an error in the document. 

Because, the plug-ins can be chained together, complex and different type of filtering and 
validation can take place. In the example above the parser covdd notify the validator plug- 
in of elements it is parsing and these in turn would go to the auto correction plug-in to 
be fixed if required and finally the client would receive these events. 
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The mark-up framework allows parser plug-ins to expose the parsed element stack to all 
validation/pre-filter/altering plug-ins. (The parsed element stack is a stack populated 
with elements from a document extracted as that document is parsed; this stack is made 
available to all validation/pre-filter/altering plug-ins to avoid the need to dupUcate the 
stack for each of these plug-ins). This also enables the plug-ins to use the stack 
information to aid in validation and filtering. For example an auto corrector plug-in may 
need to know the entire element list that is on the stack in order to figure out how to fix 
a problem. 



The use of filter/vaUdator plug-ins in mark-up language generators is especially useful for 
developers writing a cUent to the firamework and generating mark-up documents as the 
same validator plug-in used by the parser can be used in die generator. Errors are 
reported to the client when the mark-up does not conform to the vaUdator which wiU 
15 enable the developer to make sure they are writing weU formed mark-up that conforms 
to the DTD and catch error early on during development. 

The mark-up framework incorporates a character conversion module that enables 
documents written in different character sets (e& ASCH. various Kanji character sets 
20 etc.) to be parsed and converted to UTF8. This means a cUent obtains the results bom 
the parser in a generic way (UTF8) witiiout having to know the original character set that 
was used in the document. CUents hence no longer need to be able to differentiate 
between different character sets and handle the different character sets appropriately. 



DETAILED DESCRIPTION 



Overview of Key Features 

The present invention is implemented in a system called the Mark-Up Language 
Framework, used in SymbianOS from Symbian Limited, London, United Kingdom. 
SymbianOS is an operating system for smart phones and advanced mobile telephones, 
and other kinds of portable computing devices. 

The Mark-Up Language framework implements three key features. 

1 . Generic Parser API 

Clients are separated from mark-up language parsers/generators by an intermediary layer 
that (a) insulates the client from having to communicate directly with the parser or 
generator and is (b) generic in that it presents a common API to the client irrespective of 
the specific kind of parser or generator the intermediary layer interfaces with. 

2. Data validation/pre-filtering and altering components in a chain of 
responsibility 

Mark-up language parsers or generators can access components to validate, pre-filter or 
alter data; the components are plug-in components that operate using a *chain of 
responsibility* design pattern. 

3. Generic Data Supplier API 

The mark-up language parsers or generators can access data from a source using a 
generic data supplier API, insulating the parser or generator from having to 
communicate direcdy with the data source. 

Each of this features will now be discussed in more detail. 

1. Generic Parser Intermediary Layer 

The essence of this approach is that the client that interfaces with a mark up language 
parser or a generator via an intermediary layer that (a) insulates the client from having to 
communicate direcdy with the parser or generator and is (b) generic in that it presents a 



common API to the cUent irrespective of the specific kind of parser or generator the 
interrpediary layer interfaces with.. 

In this way. the dient is no longer tied to a single kind of parser or generator; it can 
operate with any different kind of parser compatible with die intermediary layer, yet it 
5 remains far simpler than prior art dients that are hard-coded to operate directiy with 
several different kinds of parsers and generators. 

The API is typically implemented as a header file. In an implementation, the 
intermediary layer acts as an extensible framework and die parsers and generators are 
themselves plug-ins to that framework. The present invention may hence readily allow 
10 the device to operate with different kinds of parsers and generators: diis extensibiUty is 
impossible to achieve widi prior art hard-coded systems. 

The specific kind of parser or generator being used is not known to the client: die 
intermediary layer fuUy insulates the cUent from needing to be aware of these spedfics. 
Instead, die client deals only with the intermediary layer, which presents to die cUent as a 
1 5 generic parser or a generic generator - i.e. a parser or generator which behaves in a way 
that is common to all parsers or generators. 

For example, the SyncML die protocol supports bodi XML and WBXML. By using both 
XML and WBXML parser and generator plug-ins in to die firamework, a SyncML cUent 
can use either or bodi type of parser and generator widiout knowing about die type of 
20 mark-up language; as a result, die design of die SyndVIL cUent is gready simpUfied. Since 
WBXML and XML are quite different in die way diey represent dieir data, one very 
useful feamre of die invention is die mapping of WBXML tokens to a string in a static 
string pool table. Appendix B expands on this idea. 

The present invention may provide a flexible and extensible file conversion system: for 
25 example, the device could parse a document written in one mark up language format and 
then use die parsed document data to generate an equivalent document in a different file 
format. Because of the extensible plug-in design of an implementation of die system, it 
is possible to provide far greater kinds of file conversion capabilities than was previously 
the case. New kinds of parsers and generators can be provided for loading onto a device 
30 after that device has been shipped to an end-user. The only requirement is diat tfiey are 
compatible widi the intermediary layer. 



Another advantage of the present invention is that it allows not only different parsers 
and generators to be readily used by the same client, but it allows also several different 
clients to share the same parsers and generators as well. The API may itself be 
extensible, so that extensions to its capabilities (e.g. to enable a new/extended mark-up 
language of a document to be handled) can be made without affecting compatibility with 
exisdng clients or existing parsers and generators. Similarly, new kinds of clients can be 
provided for loading onto a device after that device has been shipped to an end-user. 
The only requirement is that they are compatible with the intermediary layer. 

2 - Dara" viflidati6ri7p?e^^ a chainT of 

responsibility 

The essence of this approach is that the mark-up language parser or generator can access 
components to validate, pre-filter or alter data, in which the components are plug-in 
components that operate using a chain of responsibility. 

Because of the plug-in design of the components, the system is inherently flexible and 
extensible compared with prior art systems in which a component (for validating, pre- 
filtering or altering data from a parser or generator) would be tied exclusively to a given 
parser. Hence, if a mark up language of a document is extended, or a new one created, it 
is possible to write any new validation/pre-filter/altering plug-in diat is needed to work 
with the extended or new language. These new kinds of validation/pre-filter/altering 
plug-ins can be provided for loading onto a device even after that device has been 
shipped to an end-user. The *chain of responsibility' design pattern, whilst known in 
object oriented programming, has not previsouly been used in the present context. 

The plug-in components may all present a common, generic API to the parser and 
generator. Hence, the same plug-in can be used with different types of parsers and 
generators (e.g. a XML parser, a WBXML parser, a RTF parser etc.). The plug-ins also 
present a common, generic API to a client component using the parser or generator. 
Hence, the same plug-ins can be used by different clients. 

For example a DTD validator plug-in could be written that validates the mark-up of a 
document and can report errors to the client. Or for a web browser an auto correction 



plug-in filter could be written that tries to correct errors found in the mark-up language, 
such as a missing end element tag, or a incorrectly placed element tag. The auto 
correction plug-in will, if it can, fix the error transparently to the cUent. This enables a 
web browser to stiU display a document radier then just displaying an error reporting that 
5 there was an error in the document. 

Because, the plug-ins can be chained together, complex and different type of filtering and 
vaUdation can take place. In the example above the parser could notify the validator plug- 
in of elements it is parsing and these in turn would go to the auto conrection plug-in to 
1 0 be fixed if required and finally the client would receive these events. 

The mark-up framework allows parser plug-ins to expose the parsed element stack to all 
validation/pre-filter/altering plug-ins. (The parsed elernent stack is a stack populated 
with elements from a document extracted as that document is parsed; this stack is made 
1 5 available to all vaUdadon/pre-filter/altering plug-ins to avoid the need to dupUcate the 
stack for each of these plug-ins). This also enables the plug-ins to use the stack 
information to aid in validation and filtering. For example an auto corrector plug-in may 
need to know the entire element Ust that is on the stack in order to figure out how to fix 
a problem. 

20 

The use of filter/vaUdator plug-ins in mirk-up language generators is especially useful for 
developers writing a client to the framework and generating mark-up documents as the 
same validator plug-in used by the parser can be used in the generator. Errors are 
reported to the cUent when the mark-up does not conform to the vaUdator which will 
25 enable die developer to make sure diey are writing well, formed mark-up that conforms 
to die DTD and catch error early on during development. 

The mark-up framework incorporates a character conversion module that enables 
documents written in different character sets (e,g, ASCU, various Kanji character sets 
30 etc.) to be parsed and converted to UTF8. This means a cUent obtains the results from 
the parser in a generic way (UTF8) widiout having to know the original character set that 
was used in the document. Clients hence no longer need to be able to differentiate 
between different character sets and handle die different character sets appropriately. 
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3, Generic Data Supplier API 

The mark-up language parser or generator accesses data from a source using a generic 
data supplier API. Hence, the parser or generator is insulated from having to talk directly 
to a data source; instead, it does so via a generic data supplier API, acting as an 
intermediary layer. This de-couples the parser or generator from the data source and 
hence means that the parser or generator no longer have to be hard coded for a specific 
data supplier. This in turn leads to a simplification of the parser and generator design. 

The present invention allows parsing and generation to be carried out with any data 
source. For example, a buffer in memory could be used, as could a file, as could 
iir?H?2^5S..f?9.?Li! socket (hence enabUng the ability to. . parse. JbcLjteal-tinie-Jrota. data, 
streamed over the internet). There is no reqviirement to define, at parser/generator build 
time, what particular data source will be used. Instead, the system allows any source that 
can use the generic data supplier API to be adopted. New types of data sources can be 
utilised by computing device, even after those devices have been shipped to end-users. 

The present invention is implemented in a system called the Mark-Up Language 
Framework, used in SymbianOS from Symbian Limited, London, United Kingdom. 
SymbianOS is an operating system for smart phones and advanced mobile telephones, 
and other kinds of portable computing devices. 

Appendix 1 describes die Mark-Up Language Framework in more detail. Appendix 2 
describes a particular technique, referred to as 'String Pool*, which is used in the Mark- 
Up Language Framework. The appendices refer to various SymbianOS specific 
programming techniques and structures. There is an extensive published literature 
describing these techniques; reference may for example be made to "Professional 
Symbian Programming" Wrox Press Inc. ISBN: 186100303X. 
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Introduction 
Purpose and Scope 

This document describes the architecture for a generic mark-up framework. The 
framework is extendable by using plug-ins so that mark-up parsers and generators (e.g. 
XML[1], WBXML[2] ) can be used 

Design Overview 
Block Diagrams 

The mark-up framework block diagram is shown in Error! Reference source not 
found*. The Client is the application using the mark-up framework for parsing or 
generating a document^ The Parser and Generator components 
specific to a mark-up language (e.g. XML or WBXML). These components use the 
Namespace collection to retrieve information about a specific namespace during the 
parsing or generating phase. 

The Namespace Plug-in component is an ECOM plug-in that sets-up all the elements, 
attributes and attribute values for a namespace. For each namespace used tiiere must be a 
plug-in that describes the namespace. The namespace information is stored in a string 
pool. The string pool is a way of storing strings that makes comparison almost 
instantaneous at the expense of string creation. It is particularly efficient at handling 
string constants that are known at compile time, which makes it very suitable for 
processing documents. The Namespace owns the string pool that the Parser, 
Generator and Client can gain access to. 

The Namespace Plug-in simply sets-up the string pool with the required strings for the 
namespace the plug-in represents. The Client may get access to the Namespace 
Collection via the Parser or Generator to pre-load namespaces prior to parsing or 
generating documents which may speed up the parsing or generating session. 

The Plug-in components (1 — 4) are optional and allow further processing of the data 
before the client receive it such as DTD validators or document auto correctors. 
Validators check the elements and atmbutes conform to the DTD. Document auto 
correction plug-ins are used to try to correct errors reported ftom DTD validators. 
The parser is event driven and sends events to the various plug-ins and UI during 
parsing. Error! Reference source not found, shows a client parsing witii a DTD 
validator and auto corrector. The client talks to the parser directiy to start the parse. The 
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parser sends events to the chain of plug-ins. The first plug-in that receives events is the 
DTD validator plug-in. This plug-in vaUdates that the data in the event it received is. 
correct. If it is not correct it will send the same event the parser sent to the validator to 
the auto corrector except for a error code that will describe the problem the vaHdator 
encountered. It the event data is valid the same event will be sent to the auto corrector. 
Now the auto corrector receives the event and can check for any errors. If there is an 
error it can attempt to correct it. If it can correct the error it will modify the data in the 
event and remove the error code before sending the event to the client. The cUeiit finally 
receives the event and can now handle it. 

Error! Reference source not found, illustrates a client generating using a DTD 
validator and auto corrector plug-ins. A real dient would probably never use a generator 
and auto corrector since the data the dient generates should always be vaUd, but it is used 
here to show the flow of events from a generator and any plug-ins attached. 

i The client sends a bmld request to the generator. The first thing the generator does is to 
send the request as an event to the DTD validator plug-in. The situation is similar to the 
parser, the DTD vaUdator plug-in validates that the data in the event it received is 
correct. If it is not correct it wUl send the same event the parser sent to the validator to 
the auto corrector except for a error code that wiU describe the problem the validator 

0 encountered. It the event data is valid the same event will be sent to the auto corrector. 
Now the auto corrector receives the event and can dieck for any errors. If there is an 
error it can attempt to correct it. If it can correct the error it will modify the data in the 
event and remove the error code before sending the event back to the generator. The 
major difference between the events during parsing and generating is with generating, 

15 once the final plug-in has dealt with the event it gets sent back to the generator. The 
generator receives the event and builds up part of the document using the details from 
the event. 



30 Parsing and Generating WBXML 

Parsing WBXML is quite different to parsing XML or HTML. The main difference is 
elements and attributes are defined as tokens rather than using their text representation. 
This means a mapping needs to be stored between a NSC^XML token and its static string 
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representation. The Namespace plug-in for a particular namespace wiU store tiiese 
mappings. A WBXML parser and generator can then obtain a string from the 
namespace plug-in given the WBXML token and vice versa. 

5 Class Diagram 

The class diagram for the mark-up framework is shown in Error! Reference source not 
found.. The diagram also depicts plug-ins that makes use of die framework. The green 
(or dark grey classes in bficw) are the plug-ins that provide implementation to die mark- 
up framework. CXmlParser and CWbxmlParser provide an implementation to parse 

10 XML and WBXML documents respectively. In, the same way CXmlGenerator and 
CWbxmlGeneratox generate XML and WBXML documents respectively. CValidator is 
^a'*plugMn the mark-up document during parsing or generating. 

CAutoCorrector is a plug-in that corrects invalid mark-up documents. 
When parsing a document and the client receives events for the start of an element for 

15 example (OnStartElementL), the element RString in the event is a handle to a stnng in 
the stting pool. If this is a known string, i.e. one tiiat has been added by the Namespace 
Plug-in then the string will be static. Otherwise, if it is an xmknown string, die parser will 
add the string to the string pool as a dynamic string and return a RString with a handle 
of this string. It is not possible to know if a RString is dynamic or static so the parser or 

20 generator that obtains a RString must be sure to close it to. ensure any memory is 
released if the string is dynamic. A client that wishes to use the RString after the event 
returns to the parser must make a copy of it which will increase the reference count and 
make sure it is not deleted when the parser doses it. 

25 Error! Reference source not found, is an example class diagram that shows die major 
classes for parsing WBXML SyncML documents. The client creates a 
CDescriptorDataSupplier that supplies the data to the parser. CWbxmlParser is the 
class chat actually parses the document. CSyncMLNamespace is the namespace for 
SyncML that the parser uses to map WBXML tokens to strings. All the other classes 

30 belong to the mark-up framework. To parse a document with different namespaces die 
only thing that needs to be added is a plug-in for each namespace. 

Class Dictionary 
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Object name 






MMarkupCallback , 


At call' back tnat a client must 
implement so that the parser can 
report events back to the client 
during the parsing session. 


fnVipritprl clients and oluff-ins. 


RN amespaceCoUection 


Contains a collection of namespaces. 
Contains reference counter so multiple 
parsers or generators may use the same 
namespace collecdon. 


Owned by either CParserSession o 
CGeneratorSession. Owns an array o 
CMarkupNamespace plug-ins. 


CMarkupNamespace 


ECOM interface to implement a 
namespace. 


Inherited by any namespace plug-ins. 


RParserSession 


IrUDlic interlace lor a cuem. ro 
create a parser session. 


Owned hv the client* 


RGeneratorSession 


Public interface for a client to 
create a generator session. 


Owned by the client. 


CMarkupCharSetConverter 


Helper function ; which uses 
CCnvCharacterSetConverter for the 
client, parser and generator to do any 
character set conversions or resolving 
MId onums or internec-stanaara nanicb 
of character sets. 


Owned by RParserSession an 
RGeneratorSession. 


CMarkupPluginBase 


Generic interface for any type of plug-in. 


Inherited by C M arkup P lu gi r 
CParserSession and CGeneratorSession 


CMarkupPlugin 


xii^v^iVi interlace lor piug^iuo uc 
used by the parser and generator. 


Owned bv CParserSession c 
CGeneratorSession. 


MD ataSupplier Reader 


Pure virtual interface to be implemented 
by a data supplier for reading data. 


Inherited by the client's da 
provider. 


MDataSuppberWnter 


Pure virtual interlace to oe impicmcnrea 
by a data supplier for writing data. 


Inherited by the client's da 
provider. 


CParserSession 


ECOM interface for parser plug-ms. 


Inherited by a concrete pars 
implementation. 


CGeneratorSession 


ECOM interface for generator 
plug-ins. 


Inherited by a concrete generat 
implementation. 


RAttribute 


Contains the name and value of an 
attribute. 


Used by the parse, generator aj 
client. 
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The classes below are not part of the framework but illustrate how the iramewockcsm^^Bled. 



CValldator 


i\ LJ xLJy scnemsi or sonic orncr 
type of validator. 


RGeneratorSession. 


CAutoCorrector 


Used to auto correct invalid data. 


Owned by RParserSession or 
RGeneratorSession. 


CXmlParser 


An XML parser implementation. 


Owned by RParserSession. 


CXJC^!) X mlP a rs e r 


i\ WD-fxivu-» parser impiemcnranon. 


vjwnea oy JnJr arserDessioxi. 


CXmlGenerator 


An XML generator 
impiementadon. 


Owned by RGeneratorSession. 


CWbxmlGenerator 


A WBXML generator 
implementadoni 


Owned by RGeneratorSession. 








CNamespace 


A namespace plug-in to use with a 
parser and generator. 


Owned by 
RNamespaceCollection. 


RElementStack 


A stack of the currendy processed 
elements during parsing or 
generating. 


Owned by CParserSession and 
CGeneratorSession. 



Detailed Design 
5 RParserSession 



The following is the public API for this class: 



Method 


.Description 


void OpenL( 

MDacaSupplierReader& aReader, 

const TDesCSSc 

aMarkupMimeT3^e, 

const TDesC8& 

aDo cumentMimeType, 

MMarkupCallback& aCallback) 


Opens a parser session. 

aReader is the data supplier reader to use during parsing. 
aMarkupMimeType is the MIME type of the parser to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse, 

aCallback is a reference to the call-^back so the parser can 
report events. 


void OpenL( 


Opens a parser session. 
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MDataSupplierReader& aReader, | ' 

const TDesC8& 

aMarkupMimeType, 

const TDesC8& 

aDocumentMimeType, 

MMarkupCallbackSc aCallback, 

RMarkupPlugins aPlugins) 


aReader is the data supplier reader to use during parsing. 
aMarkupMimeType is the MIME type of the parser to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse. 

aCallback is a reference to the call-back so die parser can 
report events. 

aPlugms IS an array ot piug-ins to use wilti mc paiacx. xlls^ 
first plug-in in the list is the first plug-in to be called back 
from the parser. The first plug-in will then call-back to the 
second plug-in etc. 


void OpenL( 1 

MDataSupplierReader& aReader, 

const TDesC8& 

aMarkupMimeType, 

const TDesC8& 

aDocumentMimeType, 

MMarkupCallback& aCallback, 

RMarkupPlugins aPlugins Q, 

RN amespaceCoUection 

aNamespaceCollection) 


Opens a parser session. 

aReader is the data supplier reader to use during parsing. 
aMarkupMimeType is the MIME type of the parser to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse. 

aCallback is a reference to the call-back so the parser can 
report events. 

aPlugins is an array of plug-ins to use with the parser. The 
first plug-in in the list is the first plug-in to be called back 
from the parser. The first plug-in will then call-back to the 

second plug-in etc. 

1 ^ KT ^ A <^ ^ A ^ 1 1 A ^-i n ic ct VtcinHlf* tci fl. orevious 
1 3.]^ ame sp ace v^oxxeciioii is a ii<iiiu.ic a j^i.s,Y*\^**a 

1 «-kA«-noc4^o^«» <-nl1#a/-<-ir\n Thi«! !<? ii<ieful If a ccnerator or 
1 namespace cojulcclioii. aiuo lo u.otj.wj. ** 

1 arini-Vi<=»r ncjr<!pr Qp<;<;inn has been created so that same 
1 oomp^nare coHectioii can be shared. 


void CloseQ 


1 Closes the parser session. 


void StartQ 


1 Start parsing the document. 




1 Stop parsing the document. 


void Reset( 

MDataSupplierReaderSc aReader, 
MMarkupCallback& aCallback) 


Resets the parser ready to parse a new docvunent. 
aReader is the data supplier reader to use during parsing. 
aCaUback is a reference to the caU-back so the parser can 
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report events. 

Selects one or more parse modes. 
aParseMode is one or more of the following: 

EConvertTagsToLowerCase — Converts elements and 
attributes to lowercase. This can be used for case- 
insensitive HTML so that a tag can be matched to a 
static string in the string pool. 

EErrorOnUnrecognisedTags - Reports an error 
when unrecognised tags are found. 

EReportUnrecognisedTags - Reports unrecognised 

_ ^^^j 

EReportNamespaces — Reports the namespace. 
EReportNamespacePrefbces — Reports die namespace 
prefix. 

ESendFullContentlnOneChunk - Sends all content 
data for an element in one chunk. 

EReportNameSpaceMapping — Reports namespace 
mappings via the DoStattPrefixMappingO & 
DoEndPrefixMappingO methods. 

If this function is not called the default will be: 
EReportUnrecognisedTags | EReportNamespaces 

If the parsing mode is not supported KErrNotSupported is 
returned. 



RGeneratorSession 



The following is the public API for this class: 



Method ; : , , 


yDescriptipn .Lt...:,...,.!in;^4 


UJ'k'^j, - 


void OpenL( 

MDataSupplierWriter& aWriter, 


Opens a generator session. 

aWriter is the data supplier writer used to generate a 



Tint SetParseMode( 
Tint aParseMode) 
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rUid aMarkupMimeType, 1 < 
const TDes(-D& ' 
aDocumentMimeType) 


iooiment. I 
r»ii>r«*»L-i-frfcA4'im*»Tvoe is the MIME tvoe of the generator to 1 

Open, 1 
aDocumentMimeType is the MIME type of the 

document to parse. j 






void OpenL( 1 
MDataSupplierWriter& aWriter, 
TUid aMarkupMimeType, 
const TDesC8& 
aDocumentMimeType, 
RMarkupPlugins aPluginsQ) 


Opens a generator session, 

aWriter is the data supplier writer used to generate a 
document. 

aMarkupMimeType is the MIME type of the generator to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse. 

aPlugins is an array of plug-ins to use widi die generator. 


void OpenL( 1 
MDataSupplierWriter& aWriter, 
TUid aMarkupMimeType, 
const T D e s C 8 & 
aDocumentMimeType, 
RMarkupPlugins aPluginsQ, 
RNamespaceCollecdon 
aNainespaceCollection) 


Opens a generator session. 1 
aWriter is the data supplier writer used to generate a 
document. 

aMarkupMimeType is the MIME type of die generator to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse. 

aPlugins is an array of plxig-ins to use with the generator. 
I oTSJ<im**ct%<ir£»Pollection is a handle to a previous 
ncim*»cnQrp rnllection This is uscful if a orenerator or 
another parser session has been created so that same 
namespace collection can be shared. 


void CloseQ 


1 Closes the generator session. 




void Reset( 

MDataSupplierWriterfic aWriter, 
MMarkupCallback& aCallback) 


Resets the generator ready to generate a new document. 
1 A\c74>t4-»«* ie i-Vi^ rtoi-o ciinnlier writer used to generate a 

1 aVvrilcr is UlC UaLSI oUpU**^-*- .VVXA1.V.A fc**jw** •►w 1^ 

document. 

aCaUback is a reference to the call-back so the generator 
can report events. 




void BuildStartDocumentL( 
RDocumentParameters 


1 Builds the start of the document. 
aDocParam specifies the various parameters of the 
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aDocParam); 


document. In the case of WBXML this would stsS^^t 
public ID and string table. 


void BuildEndDocumentLQ 


Builds the end of the document. 


void BuildStartElementL( 
RTagInfo& aElement, 
RAttributeArray& aAttributes) 


Builds the start element with attributes and namespace if 
specified. 

aElement is a handle to the element's details. 
aAttributes contains the attributes for the element- 


void BuildEndElementL( 

RTagInfo&; aElement) 


Builds the end of the element. 

aElement is a handle to the element's details. 


void BuildContentL( 

const TDesCSSc aContentPart) 


Builds part or all of the content. Large content should be 
_ built jiLchui^^^ „&nc.tiQn^shojild_hc_.called many., 
times for each chunk. 

aBytes is the raw content data. This data must be converted 
to the correct character set by the client 




void BuildPrefixMappingL( 
RString& aPrefix, 
RString& aUri) 


Builds a prefix — URI namespace for the next element to be 
built. This method can be called for each namespace that 
needs to be declared. 

aPrefix is the Namespace prefix being declared. 
aUri is the Namespace URI the prefix is mapped to. 


void BuildProcessingInstructionL( 
RString& aTarget, 
RString& aData) 


Build a processing instmction. 

aTarget is the processing instruction target 

aData is the processing instruction data. 


RTaglnfo 

The following is the public API for this class: 


Method 


oDescription 


void Open( 
RStringSc aUri, 
RString& aPrefix, 
RStringSc aLocalName) 


Sets the tag information for an element or attribute. 
aUri is the URI of the namespace. 
aPrefix is the prefix of the qualified name. 
aLocalName is the local name of the qualified name. 


void CloseQ 


Closes the tag information. 


RString& UriQ 


Returns the URI. 


RStringfic LocalNameO 


Returns the local name. 
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RString& PrefixQ 1 ^ 


Returns the prefix. 1 


RNamespaceCollection 

The following is the public API for this class: 


Method , j ,M^,.; 


Description • ] :j,^;>-o^,Y | 


void ConnectQ 


Every time this method is called a reference counter is 
incremented so that the namespace collection is only 
destroyed when no clients are using it. 


void CloseQ 


Every time this method is called a reference counter is 
decremented and the object is destroyed only when the 
reference counter is zero. 


const CMarkupNameSpace& 

OpenN amespaceL( 

const TDesC8& aMimeType) 


Opens a namespace plug-in and returns a reference to the 
namespace plug-in. If the namespace plug-in is not loaded it 
will be automatically loaded, 

aMimeType is the MIME type of the plug-in to open. 


const CMarkupNameSpace& 
OpenN amespaceL( 
TUintS aCodePage) 


Opens a namespace plug-in and returns a reference to the 
namespace plug-in. 

aCodePage is the code page of the plug-in to open. 


void ResetQ 


Resets the namespace collection and string pool. 


RStringPool StringPoolQ 


Returns a handle to the string pool object. 


CMarkupNamespace 

The follo^idng is the API for this class: 


1 Method , |l|Description 


void NewL(RStringPool 
aStringPool) 


Creates the namespace plug-in. 

aSttingPool is a handle of the string pool to add static 
string tables. 


RString&c Element( 

TUintS aWbxmlToken) const 


Returns a handle to the string. 

aWbxmlToken is the WBXML token of the element. 


void Attribute ValuePair( 
TUintS aWbxnnJToken 
RString& aAttribute, 
RString& aValue) const 


Returns a handle to the attribute and value strings. 
aWbxmlToken is the WBXML token of the attribute. 
aAttribute is die handle to the attribute string, 
aValue is the handle to the value string. 



PIuj»- In (Chains 



• 
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RString& Attribute Value( 
TUintS aWbxmlToken) const 


Returns a handle to an attribute value. ^^^^ 
aWbxmlToken is the WBXML token of die attribute. 




RStringSc NamespaceUriO const 


Returns the namespace name. 




TUintS CodePageO const 


Returns die code page for diis namespace. 


RTableCodePage 

The following is the API for this class: 




Method 


Description 




RString NameSpaceUriQ 


Returns the namespace URI for this code page. 




Tint StringPoolIndexFromToken( 
Tint aToken); 


Gets a StringPool index from a token value. -1 is returned if 
die item is not found. 




Tint TokenFromStringPoolIndex( 
Tint aindex); 


Gets a token value from a StringPool index. -1 is returned if 
the item is not found. 



5 CMarkupPluginBase 

The foUowing is die API for this ECOM class: 



Method 


Description j 


CMarkupPluginBase& RootPluginQ 


Returns a reference to the root plug-in. This must be either 
a parser or generator plug-in. 


CMarkupPluginBase& 
ParentPluginO 


Returns a reference to the Parent plug-in. 


REIementStack& ElementStackQ 


Returns a handle to the element stack. 


RNameSpaceCollecdon& 
NamespaceCollecdonO 


Returns a handle to the namespace coUecdon. 


CMarkupCharSetConverterfic 
CharSetConverterQ 


Returns a reference to the character set converter object 


TBool IsChildElementValid( 
RString& aParentElement, 
RString& aChildElement) 


Checks if the aChildElement is a valid child of 
aParentElement. 


CMarkupPlugin 

The following is the API for this ECOM class: 


Method 


[Description 
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CMarkupPlug^n* NewL( 
MMarkupCaIlback& aCallback) 


Creates an instance of a mark-up plug-in. 

aCallback is a reference to the call-back to report events. 


void SetParent( 

CMarkupPluginBase* 

aParentPlugin) 


Sets the parent plug-in for this plug-in. 
aParentPlugin is a pointer to the parent plug-in or NULL 
if there is no parent. A parser or generator does not have a 
parent so this must not be set, as the default NULL wiU 
indication there is not parent. 



CParserSession 

The following is the API for this ECOM class: 



Method 



viAiPescription 



CParserSession* NewL( 

MDataSupplierReaderSc aReader, 

const TDesC8& 

aMarkupMimeType, 

const TDesC8& 
aDocumentMimeType, 
MMarkupCallback& aCallback, 
RNamespaceCoUectipn* 
aNamespaceCoUection, 
CMarkupCharSetConverterSc 
aCharSetConverter) 



void StartQ 



void StopO 



Opens a parser session. 

aReader is the data supplier reader to use dming parsing. 
aMarkupMimeType is the MIME type of the parser to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse. 
aCallback is a reference to the call-back so the parser can 

report events. 

aNamespaceCoUection is a handle to a previous 
namespace collection. Set to NULL if a new 
RNamespaceCoilection is to be used. 
aCharSetConverter is a reference to the character set 
conversion class. 
Start parsing the document. 
Stop parsing the docviment. 



void Reset( 
MDataSupplierReader& aReader, 
MMarkupCallback& aCallback) 



Resets the parser ready to parse a new document. 
aReader is the data supplier reader to use during parsing. 
aCallback is a reference to die call-back, so the parser can 
report events. 



void SetParseMode( 
Tint aPatseMode) 



Selects one or more parse modes. 

See RParserSession for details on aParseMode. 
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CGeneratotSession 

The following is die API for this ECOM class: 



Method 


Description 1 


void OpenL( 

MD'ataSupplierWriter& aWriter, 

TT Ur\ aMaflfiirtTVfimf^Txmp 

~ ~ ■ jj- J y 

const TDesC8& 
aDocumentMimeTjTpe, 
MMarkupCallback& aCallback, 
RNamespaceCoUecdon* 
aNaniespaceCollection, 


Opens a generator session. 

aWriter is the data supplier writer used to generate a 
document, 

aMarkupMimeType is the MIME type of the generator to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse. 

aCallback is a reference to the call-back so the generator 


CMarkupCharSetConverterSc 
aCharSetConvertet) 


can report events. 

aNamespaceCollection is a handle to a previous 
namespace collection. Set to NULL if a new 
RNamespaceCoUection is to be used. 
aCharSetConverter is a reference to the character set 
conversion class. 


void Reset( 

MDataSupplierWriter& aWriter, 
MMarkupCallback& aCallback) 


Resets the generator ready to generate a new document. 
aWriter is the data supplier writer used to generate a 
document, 

aCallback is a reference to the call-back so the generator 
can report events. 


void BuildStartDocumentL( 

RDocumentParameters 

aDocParam); 


Builds the start of the document. 

aDocParam specifies the various parameters of the 
document. 


void BuildEndDocumentLQ 


Builds the end of the document. 


voiQ DuiiaotartiZiienicnLjLf^^ 
RTagInfo& aElement, 
RAttributeArray& aAttributes) 


RiiiIihIc tVkf> Q.tQTt f»lpmf»ni" with atftHhiitp^ ^nd namesoace ir 
Specified. 

aElement is a handle to the element's details. 
aAttributes contains the attributes for the element. 


void BuildEndElementL( 
RTaglnfofic aElement) 


Builds the end of the element 

aElement is a handle to the element's details. 


void BuildContentL( 


Builds part or all of the content. Large content should be 
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const TDesC8& aContentrart; 


Huilt in chunks. I,e, this function should be called many 
times for each chunk. 

aBytes is the raw content data. This data must be converted 
to the correct character set by the client. 


void BuildProcessingInstructdonL( 
RStringSc aTarget, 
RString& aData) 


Build a processing instruction. 

aTarget is the processing instruction target. 

aData is the processing instruction data. 



RAttribute 



The following is die API for tiiis class: 



Method 


Description | 


RTagInfo& AttributeQ 


Returns a handle to tne atmoutc s name ui^uu^. 


TAttributeType TypeQ 


Returns the attribute's type. Where TAttributeType is one 

of the following enumeration: 

CDATA 

ID 

IDREF 

IDREFS 

NMTOKEN 

NMTOKENS 

ENTITY 

ENTITIES 

NOTATION 


RString& ValueQ 


Returns a handle to the attribute value. If the attnbute value 
is a Ust of tokens (IDREFS, ENTITIES or NMTOKENS), 
the tokens will be concatenated into a single RString with 
each token separated by a single space. 
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MDataSuppUerReader 

The following is the API for this mix-in class: 



24 



TUintS GetByteLO 


Get a single byte from the data supplier. ^^^P 


const TDesC8& GetBvtesL^ 
Tint aNumberOfBytes) 


of bvtes is not available this methofl leaver xxrith TCRrr'Rnf 
The returned descriptor must not be deleted until another 
call to GetBytesL or EndTransactionLQ is made. 


void StartTransactionLO 


The parser calls this to indicate the start of a transaction. 


void EndTrasacdonLQ 


The parser calls this to indicate the transaction has ended. 
Any data stored for the transaction may now be deleted. 


void RoilbackLQ 


The parse calls this to indicate die transaction must be rolled 
back to the exact state as when StartTransactionLQ was 


MDataSupplierWriter 

The following is the API for this mfac- 


in class: 


Method 


Description 


void PutByteL( 
TUintS aByte) 


Put a bjrte in the data supplier. 


void PutBytesL( 

const TDesC8& aBytes) 


Puts a descriptor in the data supplier. 


MMarkupCallb ack 

The foUowing is the API for this mix- 


•in class: 


Method 


Description 


void OnStartDocumentL( 

RDocumentParameters 

aDocParam, 

Tint aErrorCode); 


Callback to indicate the start of the document 
aDocParam specifies the various parameters of the 
document. 

aErrorCode is the error code. If this is not KEtrNone then 
special action may be required. 


void OnEndDocumentL( 
Tint aErrorCode); 


Indicates the end of the document has been reached 
aErrorCode is the error code. If this is not KErrNone dien 
special action may be required. 


void OnStartElementL( 
RTagInfo& aElement, 


Callback to indicate an element has been parsed. 
aElement is a handle to the element's details. 



Plug- In Chains 




25 



RAttributeArray& aAttributes, 
Tint aErrorCode); 



aAttributes contains the attributes for the element. 
aErrorCode is the error code. If this is not KErtNone then 
special action may be required. 



void OnEndElementL( 
RTagInfo& aElement, 
Tint aErrorCode); 



Callback to indicate the end of the element has been 
reached. 

aElement is a handle to the element's details. 

aErrorCode is the error code. If this is not KErrNone then 

special action may be required. 



void OnContentL( 
const TDesC8& aBytes, 
Tint aErrorCode) 



Sends the content of the element. Not all the content may 
be returned in one go. The data may be sent in chunks 
When an OnEndElementL is received this means there is 
no more content to be sent. 

aBytes is the raw content data for the element. The client is 
responsible for converting die data to die required character 
set if necessary. In some instances widi WBXML opaque 
data the content may be binary and must not be converted. 
aErrorCode is the error code. If tiiis is not KErrNone dien 
special action may be required. 



void OnStartPrefixMappingL( 
RString& aPrefix, 
RString& aUri, 
Tint aErrorCode) 



Notification of the beginning of die scope of a prefix-URI 
Namespace mapping. This metiiod is always called before 
the corresponding OnStartElementL mediod. 
aPrefix is the Namespace prefix being declared. 
aUri is die Namespace URI die prefix is mapped to. 
aErrorCode is the error code. If tiiis is not KErrNone dien 
special action may be required. 



void OnEndPrefixMappingL( 
RString& aPrefix, 
Tint aErrorCode) 



Notification of the end of the scope of a prefix-URI 
mapping. This mediod is caUed after die corresponding 
DoEndElementL method. 

aPrefix is die Namespace prefix diat was mapped. 
aErrorCode is die error code. If diis is not KErrNone dien 
special action may be required. 



void OnIgnoreableWhiteSpaceL( 
const TDesC8& aBytes, 



Notification of ignorable whitespace in element content. 
aBytes are the ignored bytes ficom die document being 



Tint aErrorCode) 


parsed. ^J^^ 
aErrorCode is the error code. If this is not KErrNone then 
special action may be required. 


void OnSkippedEntityL( 
RString& aName, 
Tint aErrorCode) 


Notification of a skipped entity. If the parser encoiinters an 

external entity it does not need to expand it — it can return 

the entity as aName for the client to deal with. 

aName is the name of the skipped entity. 

aErrorCode is the error code. If this is not KErrNone then 

special action may be required. 


void OnProcessingInstructionL( 
const TDesC8& aTarget^ 
const TDesC8& aData, 
Tint aErrorCode) 


Receive notification of a processing instruction. 

.aCCMgetis-thcpmcessing instmcrion-taiget. 

aData is the processing instruction data. If empty none was 
supplied. 

aErrorCode is the error code. If this is not KErrNone tiien 
special action may be reqmred. 


void OnOutOfDataLO 


There is no more data in the data supplier to parse. If there 
is more data to parse StartQ should be called once there is 
more data in the supplier to continue parsing. 


void OnError(TInt aError) 


An error has occurred where aError is the error code 



Sequence Diagrams 

Setting up, parsing and generating 

Error! Reference source not found, shows the interaction of the client with die various 
parser objects to create a parser and generator session. The parsing of a simple document 
with only one element and generation of one element is shown. It is assumed a DTD 
validator and auto correct component are used. Auto* correction in this example is only 
used with the parser. The generator only checks that tags are DTD compliant but does 
not try to correct any DTD errors. 



Element not valid at current level in DTD 

Auto correction is left up to the plug-in implementers to decide how and what should be 
corrected. 



In (*hains 
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The sequence diagram in Figure 4 shows an example of what is possible with the case 
where the format of the document is vaUd. however, there is a invalid element (C) that 
should be at a different level as shown in an example document below: 



<A>Content 
<B> 

<C> // Not vaUd for the DTD, should be a root element. 
Some content 
</C> 

</B> 

</A> 

// <C> should go here 

The bad element is detected by the DTD vaUdator and sent to the auto correct 
component. The auto corrector realises that this element has an error firom the error 
code passed in the caU-back and tries to find out where the element should go, and send 
back the appropriate OnEndElementLO call-backs to the dient. 



Scenarios 

20 Set-up a parser to parse WBXML without any plug-ins. 
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Scenario to parse die following document: 



<A> 



<B> 

Content 

</B> 



<A> 



30 

1 . The client creates a data suppUer that contains the data to be parsed. 
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2. The client creates an RParserSession passing in the data supplier, MIME type for 
WBXML, the MIME type of the document to be parsed and the call-back pointer 
where parsing events are to be received. 

3. The client begins the parsing by calling StartQ on the parser session. 

4. The parser makes the following caU-backs to the client: 

OnStartDocumentLQ 
OnStartElementLCA') 
OnStartElementLCB') 
OnContent(*Content') 
OnEndElementLCB') 

OnEndElementLCAp 

OnEndDocumentLQ 

Set-up a parser to parse WBXML with a validator plug-in 
The same document as 5.1 is used in this scenario. 

1. The client creates a data supplier that contains the data to be parsed.^ 

2. The client constructs a RMarkupPIugins object with the UID of a validator. 

3. The client creates an RParserSession passing in the data supplier, MIME type for 
WBXML, the MIME type of the document to be parsed, call-back pointer where 
parsing events are to be received and the array of plug-ins object. 

4. The parser session first iterates through the array of plug-ins starting from the end of 
the list. It creates the CValidator ECOM object setting the call back to the client 
The CWbxmlParser ECOM object is created next and its call-back is set to the 
CValidator object. This sets up the chain of call-back events from the parser 
through to the validator and then the client. The validator needs access to data &om 
the parser so SetParent needs to be called on all the plug-ins in the array. The 
validator sets its parent to the parser object. 

5. The client begins the parsing by calling StartQ on the parser session. 

6. The parser makes the following call-backs to the client: 

OnStartDocumentLQ 
OnStartElementLCA') 
OnStartElementLCB') 
OnContent^Content') 
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OnEndElementLCB') 
OnEndElementLCA") 
OnEndDocumentLQ 



Generating a WBXML document with a DTD vaUdatof 
The document in 5.1 is to be generated in diis scenario. 

1 . The client creates a data supplier with an empty buffer. 

2. The client constructs a RMarkupPlugins object with the UID of a vaHdator. 

3. The cUent creates a RParserGenerator passing in die data supplier, MIME type for 
\X^XML, MIME type of the document to be parsed and the array of plug-ins object. 

4. The generator session first iterates through the array of plug-ins starting ftom the end 
of the list It creates the CVaUdator ECOM object setting the call back tO the cUent. 
The CWbxmlGenerator ECOM object is created next and its call-back is set to the 
CValidatof object. This sets up the chain of call-back events firom the generator 
through to the vaUdator and then the dient. The vaUdator needs access to data from 
the parser so SetParent needs to be called on all the plug-ins in the array. The 
validator sets its parent to the parser object. 

5. The client then calls the following mediods: 
BuildStartDocumentLQ 
BuildStartElementL('A') 
BuildStartElementL('B') 
BviildContentL('Content') 
BuildEndElementLCB") 

BuildEndElementLCA") 



Design Considerations 

• ROM/RAM Memoicy Strategy - the string pool is used to minimise dupUcate strings. 
Error condition handUng - errors are returned back to plug-ins and the dient via the 
call-back API. 

• Localisation issues - documents can use any character set and the diaracter set is 
returned back to the client in the case of parsing so it knows how to deal with the 
data. For a generator the client can set the character set of die document. 

. Performance considerations - the string pool makes string comparisons efSdent. 



P|u};*ln Chains 
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• Platform Security — in normal usage the parser and generator do not need any 
capabilities. However, if a plug-in were designed to load a DTD from the Internet it 
would reqxiire PhoneNetwork capabilides. 

• Modularity — all components in the framework are ECOM components that can be 
5 replaced or added to in the future. 

Testing 

The data supplier and parser generator set-up components can be tested individually - all 
the functions are synchronous and therefore no active objects need to be created for 
testing. 

10 

The following steps can be carried out to test parsing and generation of WBXML or 

xml7 

1 . Load a pre-created file. 
15 2. Parse the file. 

3, Generate a buffer from the output of the parser. 

4, Compare the output of the buffer with the original pre-created file to see if they 
match. 

20 Additional tests are carried out to test error conditions of parsing, such as badly 
formatted documents and corrupt documents. 
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Open Issues 

The foUowing issues need to be resolved before this document is completed: 
5 1. If a plug-in requires capabiUdes to connect to the Internet what capabiUties does die 
framework need? 

2. The API for CMarkupCharSetConveitor and RDocumentParametets needs to 

be decided. 
Glossary 

10 The foUowing technical terms and abbreviatio ns are used within this document. 



Texm 



XML 
WBXMI- 



SAX 



DOM 



Element 



Attributes 



Values 



Extensible Markup Language 
WAP Binary Extensible Markup Language 



Simple API for XML 



Document Object Model 



This is a tag enclosed by angje brackets. E.g <Name>, <Address>, <Phonc> etc 



These are the attributes associated with an element, E.g. <Phone Type="MobUe"> The 
attribute here is **Type". 



Content 



DTD 



These are the actual value of an attribute. Rg. <Phone Type-''Mobac"> The value here is 
"Mobile 



This is the actual content for an element. E.g. <Name>Symbian</Name>. Hete 
"Symbian" is die content for the element *'Name". _ 



Document Type Definition 



MIME 



Multipurpose Internet Mail Extensions _ 

Since onlv 32 dements can be defined In WBXML. code pages ate created so that each 
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()uaHfted name 


A qualified name specifies a prefix : local name e.g." *HTML;B' 


prefix 


From the qualified name example this is 'HTML' 


local name 


From the qualified name example this is 'B' 
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Appendix A - <Auto correction examples> 

Table Al shows a situation where the end tags arc the wrong way round for A and B. 
This is very easy to fix since the DTD vaHdator keeps a stack of the tags, it knows what 
the end tag should be. 



<A>Content 
<B> 

More content 
</A> 

</B> 



Table A 1: End tags that are the wrong way round 

15 Table A2 shows the situation where the B end tag is missing. Since tiic end tag does not 
match a guess can be made that there should be an end tag for B before the end tag of A. 



<A>Content 
20 <B> 

More content 

</h.> 



25 
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Table A2: Missing end tag 



Table A3 shows the situation where there are no end tags for A and B. The DTD 
validator wiU detect the problem and send an end tag for B to the cUent The auto correct 
component will query the DTD validator if the C tag is vaUd for the parent element A. If 
it is valid a OnStartELementLO will be sent to the dient. otherwise the auto correct 
component can check further up the element stack to find where this element is vaUd. If 
it is not valid anywhere in the stack then it will be ignored together with any content and 
end element tag. 
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<A>Content 

<B> 

More content 
<C> 

Some content 
</C> 



Tabk A3: Missing end tags 
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Appendix B - How to write a namespace plug-in 

The tables below show the WBXML tokens for the example namespace. Tables 1 to 3 
each represent a static string table. Tables 1 shows the elements for code page 0. Tables 2 
and 3 are for attribute value pairs respectively. Each attribute index on Table 2 refers to 
the values of the same index in Table 3. These token values must match up in Tables 2 
and 3. If an attribute does not have a value then there must be a blank as shown in Table 
3 with token 8. For attribute values, these also appear in Table 3 but have a WBXML 
token value of 128 or greater. 



10 



Element type naixie 

• ..!■• : 

; ■ ' ' 


WBXML ■ 
token 1 


Addr 


5 


AddType 


6 


Auth 


7 


AuthLevel 


8 


Table 1: EkmenfCabM, code page 0 


1 Attribute name/value pair 
1 (attribute part) 


WBXML 1 
token 1 


TYPE 


6 


TYPE 


7 


NAME 


8 


NAME 


1' 


Table 2: AtttibuteValuePaitNameTablc, code page 0 


1 Attribute name/value pair 
1 (value part) 


WBXML 
token 


ADDRESS 


6 


URL 


7 




8 


BEARER 


9 


GSM/CSD 


128 
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GSM/SMS 


129 


GSM/USSD 


130 



Table 3: AttributeValuePairValueTable, code page 0 



The following string table files (.st) are created for each table: 



# Element table for code page 0 








stringtable ElementCodePageO 








EAddr Addr 








EAddType-AddType 








EAuth Auth 








EAuthLevel AuthLevel 









String table for Table 1 



# Attributes table for code page 0 
stringtable AttributesCodePageO 
EType Type 
EType Type 
EName Name 
EName Name 



String table for Table 2 



# Attribute values table for code page 0 
stringtable Attribute ValuesCodePageO 
EAddress Address 
EURLURL 
EBearer BEARER 
EGSM^CSD GSM/CSD 
EGSM^SMS GSM/SMS 
EGSM.USSD GSM/USSD 
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String Table for Table 3 
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<Example usage of API > 



Below shows an example of how to setting up the parser and generator with DTD 
5 checking and auto correction. 

RMarkupPlugins plugins; 
plugins.Append(KMyValidator); 
plugins.Append(KMyAutoCorrector} ; 
10 CDescriptorDataSuppUer* dataSuppUer = CDescriptorDataSuppUecrNewLCO; 
RParserSession parser; 

parser.OpenL(dataSuppUer, MarkupMimeType, DocumentMimeType, caUback, plu^s); 
parser.ParseQ;. 

/ / CaUback events will be received 
15 parser.CloseO; 

/ / Now construct a generator using the same plug-ins and data suppUer 
RGeneracorSession generator; 

generator.OpenL(dataSuppUer, MarkupMimeType, DocumentMimeType, callback, 

20 plugins); 

generator.BuildStartDocumentLQ; 

RA ttribute Array attributes; 

// Get an RString from the ElementStringTable 

RString string=generator.StringPoolO.String(mementStringTabIe:;T^^ ElementStringTable); 

25 // Build one element with content 

generator.BuildStartEIementL(string, attributes); 
generator.BuildContentL(_L8CThis is the content'^); 
generator.BuildEndE!ementL(strihg); 
generator.BuildEndDocumentLO; 
30 generator.CloseQ; 
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Appendix 2 

How the String Pool is used to parse both text and binary mark-up language 

5 The Mark-up Language framework design relies on the fact that it is possible (using the 
'String Poor techniques described below) to provide the same interface to clients no 
matter if text or binary mark-up language is used. 

Text based mark up languages use strings, i.e. sequences of characters or binary data. In 
10 the String Pool technique, stadc tables of these strings are created at compile time, with 
one string table per namespace, for all the elements, attributes and a^ 
needed to describe a particular type of mark-up document. Bach element, attribute and 
attribute value is assigned an integer number and diese integer ^handles' form an index of 
the strings. A string in an XML document can be rapidly compared to all strings in the 
15 string table by the efficient process of comparing the integer representation of the string 
with aU of the integer handles in the static string table. The main benefit of using a string 
pool for parsing is therefore that it makes it very easy and efficient for the client to check 
for what is being parsed, since handles to strings are used instead of actual strings. This 
means only integers are compared rather than many characters, as would be the normal 
20 case if string pools were not used. Also, comparisons can be carried out in a simple 
switch statement in the code, making the code efficient, and easier to read and maintain. 
Hence, the string pool is used to make string comparisons efficient at the expense of 
creation of the strings, 

25 For binary mark-up language (e.g. WBXML) the situation is more complex since there 
are no strings in WBXML. In WBXML, everything is tokenised ^.e. given a token 
number). We get around the absence of strings as follows: a table of mappings of each 
of the WBXML tokens to the index of the string in the string table is created (see Figure 
8). Each mapping is given a unique integer value — a handle. Since it is required to map 

30 from tokens to strings and vice versa, two lists of integer value handles are created: one 
indexed on tokens and the other indexed on the index of the position in die string table. 
This is so that it is quick to map from one type to the odier. All this is encapsulated in 
the namespace plug-in and therefore is insulated from the client, parser and generator. 
The client can therefore parse a binary or text document without having to know about 



Ir 
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the specific format - it simply uses the integer handle (RString), which ^vill work 
correctly for both text and binary mark-up languages. 
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CLAIMS 



1. A portable computing device programmed with a mark-up language parser or 
5 generator that can access components to validate, pre-filter or alter data, in which the 

components are plug-in components that operate using a chain of responsibility design 
pattern. 

2. The device of Claim 1 in which the plug-in components all present a common, 
generic API to the parser or generator, enabling the same plug-in to be used with 

10 different types of parsers and ^nerators. ^ 

3. The device of Claim 1 in which the plug-in components all present a common, 
generic API to a client component using the parser or generator, enabling the same plug- 
ins to be used by different clients. 

4. The device of any preceding claim in which die parser notifies a validator plug-in 
15 of elements it is parsing and these in turn go to an auto correction plug-in to be fixed if 

required and finally a client receives these events. 

5. The device of any preceding claim in which a parsed element stack is made 
available to all validation/pre-filter/altering plug-ins. 

20 

6. The device of any preceding claim which incorporates a character conversion 
module that enables documents written in different character sets to be parsed and 
converted to a common, generic character set. 

7. A method of validating, pre-filtering or altering a mark-up language document, 
25 in which a mark-up language parser or generator accesses components to validate, pre- 
filter or alter data, in which the components are plug-in components tiiat operate using^a 
chain of responsibility design pattern. 
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ABSTRACT 

MARK-UP LANGUAGE FRAMEWORK WITH VALIDATION 
COMPONENTS 

5 A portable computing device is programmed with a mark-up language parser or 
generator that can access components to vaUdate, pre-filter or alter data, in which the 
components are plug-in components that operate using a chain of responsibiUty. 
Because of the plug-in design of the components, the system is inherendy flexible and 
extensible compared with prior art systems in which a component (for vaUdating, pre- 

10 Altering or altering data from a parser or generator) would be tied exclusively to a given 
parser. 



1/6 



Plug-in 2 



Plug-in 1 



Parser 



Client 





Plug-in 4 



I 



Plug-in 3 



Fig 1: Block diagram of mark-up framework with four plug-ins. 



Client requests 
parser to start 



Client 



Auto Corrector 



DTD Validator 



Parser events 



Parser 



Fig 2; Block diagram of a dient parsing using a DTD vaUdator and auto corrector. 
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Client 


Client sends 
build requests^ 
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Gene 


orator 



Generator events 



DTDV 


alidator 






Auto cotrector 



Fig 3: Block diagram of a cUent using a generator with a DTD vaHdator and auto corrector. 
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Fig 6: Sequence diagram for parser and generator session 



6/6 

Fig 7: Sequence diagram showing DTD validation and auto correction 
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Fig 8: WBXML token of elements mapping to string table of elements in the namespace plug-in. 
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